Skip to content

[WIP] support for customizing LoRA weights through the sdapi#1982

Draft
wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sdapi_loras
Draft

[WIP] support for customizing LoRA weights through the sdapi#1982
wbruna wants to merge 1 commit intoLostRuins:concedo_experimentalfrom
wbruna:kcpp_sdapi_loras

Conversation

@wbruna
Copy link

@wbruna wbruna commented Feb 19, 2026

This is still just an idea!

Since we just got support for multiple LoRAs, we could include LoRA customization on the API side, by:

  • internally allowing the weights to be changed at generation time
  • showing the preloaded LoRAs under /sdapi/v1/loras
  • accepting just changing the weights of the preloaded LoRAs through the lora fileld at /sdapi/v1/txt2img and /sdapi/v1/img2img

I recently implemented support on my Python client script for the mainline sd-server implementation, so I have a reasonable idea about how complicated that would be. I'm also aware that the sd.cpp C API would have to be adapted to allow changing LoRA weights without reloading the models.

Do you think this would be worth implementing?

@LostRuins
Copy link
Owner

Does it have any implications on memory use or runtime file loading?

@wbruna
Copy link
Author

wbruna commented Feb 19, 2026

For at_runtime LoRA mode, I believe it wouldn't change at all.

For immediately LoRA mode, it could mean higher memory usage: currently, the code could be unloading the weights right after applying, since they wouldn't be needed anymore (need to check the code to be sure). And to change the weights, we need them back in memory, either reloading from disk or keeping them around in RAM. Generation latency would also increase a bit, because we'd need to reapply the LoRAs (but only when the weight is changed).

@henk717
Copy link
Collaborator

henk717 commented Feb 26, 2026

Personally I have seen this request a few times. There is demand for it. If its a bit slower during a switch that is better than not having it at all. Just make sure nothing changes if its not used.

@wbruna
Copy link
Author

wbruna commented Feb 27, 2026

Got a first somewhat-working version.

I've included code for the <lora:name:weight> syntax on the prompt, to make testing easier. The API code is implemented, but I didn't test it yet.

As suspected, immediately LoRA mode discards the weights as soon as they are applied (lora->free_params_buffer() in apply_loras_immediately). So we need to either remove that call (as I've done for now), or restrict changing LoRA weights to the at_runtime mode. One way to do that without an extra command-line flag could be:

  • allow --sdloramult to receive a list of multipliers
  • LoRAs with multiplier != 0 would have fixed weights, as they are now
  • LoRAs with multiplier 0 would be allowed runtime multiplier changes (through the sdapi and/or prompt). We could also add a parameter to gendefaults to still be able to set a default non-zero multiplier for them
  • the presence of any customizable LoRA would force at_runtime mode. This way, we could keep the free_params_buffer call as-is, so setups with no customizable LoRAs would keep working as they are now, with no extra memory usage. It may even be possible to force at_runtime only for the customizable LoRAs.

What do you think?

@wbruna
Copy link
Author

wbruna commented Feb 27, 2026

By the way, it's also possible to support the <lora:name:weight> syntax as an UI functionality, parsing and converting it to the sdapi parameter on the stable-ui. I'm not exactly looking forward to implement it that way, but it would make sense from a compatibility POV, since it'd allow LoRA loading from other sdapi servers too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants